Online learning in MDPs with side information

نویسندگان

  • Yasin Abbasi-Yadkori
  • Gergely Neu
چکیده

We study online learning of finite Markov decision process (MDP) problems when a side information vector is available. The problem is motivated by applications such as clinical trials, recommendation systems, etc. Such applications have an episodic structure, where each episode corresponds to a patient/customer. Our objective is to compete with the optimal dynamic policy that can take side information into account. We propose a computationally efficient algorithm and show that its regret is at most O( √ T ), where T is the number of rounds. To best of our knowledge, this is the first regret bound for this setting.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Online Linear Regression and Its Application to Model-Based Reinforcement Learning

We provide a provably efficient algorithm for learning Markov Decision Processes (MDPs) with continuous state and action spaces in the online setting. Specifically, we take a model-based approach and show that a special type of online linear regression allows us to learn MDPs with (possibly kernalized) linearly parameterized dynamics. This result builds on Kearns and Singh’s work that provides ...

متن کامل

User’s Interaction with Information through eFront Learning Management System

Background and Aim: In order to comprehension of interactive content and content production standards, and also users interaction with LMSs, and their behavior in dealing with information, the aim of this paper is to examine the users interaction information provided in the eFront application, an open source Learning Management System, by emphasizing SCORM standard. Method: The method that used...

متن کامل

Facilitating Internalization in E-Learning Through New Information System

This paper aims to study Vygotsky’s (1987) sociocultural theory of learning with respect to how it relates to technology-based second language learning and teaching. The researchers selected their participants from advanced students from Payame Noor University. We divided the participants into two groups- an experimental group and a control group. After teaching the course an experimental group...

متن کامل

Markov Decision Processes with Continuous Side Information

We consider a reinforcement learning (RL) setting in which the agent interacts with a sequence of episodic MDPs. At the start of each episode the agent has access to some side-information or context that determines the dynamics of the MDP for that episode. Our setting is motivated by applications in healthcare where baseline measurements of a patient at the start of a treatment episode form the...

متن کامل

Incremental Structure Learning in Factored MDPs with Continuous States and Actions

Learning factored transition models of structured environments has been shown to provide significant leverage when computing optimal policies for tasks within those environments. Previous work has focused on learning the structure of factored Markov Decision Processes (MDPs) with finite sets of states and actions. In this work we present an algorithm for online incremental learning of transitio...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1406.6812  شماره 

صفحات  -

تاریخ انتشار 2014